-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
TST[string]: update expecteds for using_string_dtype to fix xfails #61727
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
The JSON issues stem back to the fact that: >>> pd.Series([None, '', 'c']).astype(object) yields different behavior with/without the future string dtype. In the "old" world, this would preserve the value of In theory we could try and work around those semantics by natively supporting an object type in the JSON reader, but that's a ton of effort and I don't think worth it, given JSON does not natively support object storage |
thanks, will update those tests' |
Can't speak for the viability but I think this should be read-only per CoW. Is this related to the underlying |
I suspect we would need a mechanism like ndarray.flags for EAs to declare an object as read-only. Definitely out of scope for this PR. Looking at the test, the pertinent behavior can be tested by making the column specifically object dtype. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot Brock, the fixes in the current diff all look good!
I quickly checked that one, and I am not entirely sure why we skipped it based on "doesn't work properly for arrow strings". The issue with the test seems to be that it is assuming too much about the input data, and its (can do a separate PR to fix this one, now that I already dived into it) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll want to backport this into 2.3.x?
Looks like #61757 wasn't backported, so i dont think this one should be either |
It isn't 100% obvious that the new repr for Categoricals is an improvement, but it's non-crazy. One of the remaining xfails one is for
eval(repr(categorical_index))
round-tripping that won't be fixable unless we revert back to the old repr behavior.I'm pretty sure that the fix in test_astype_dt64_to_string is correct and the test is just wrong, but merits a close look.
That leaves 12 xfails, including the one un-fixable round-trip one that we'll just remove. Of those...
nan
instead ofNaN
. Not a huge deal but having mixed-and-matched nans/NaNs in the repr is weird.(Update: looks like I missed one in test_http_headers and another in test_fsspec)